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INCREASING PROCESSOR UTILIZATION DURING PARALLEL COMPUTATION RUNDOWN 


William H. Jones 

National Aeronautics and Space Administration 
Lewis Research Center 
Cleveland, Ohio 44135 


SUMMARY 

Some parallel processing environments provide for asynchronous execution 
and completion of general purpose parallel computations from a single computa- 
tional phase. When all the computations from such a phase are complete, a new 
parallel computational phase Is begun. Depending upon the granularity of the 
parallel computations to be performed, there may be a shortage of available 
work as a particular computational phase draws to a close (computational run- 
down). This can result In the waste of computing resources and the delay of 
the overall problem. 

In many practical Instances, strict sequential ordering of phases of 
parallel computation Is not totally required. In such cases, the "beginning" 
of one phase can be correctly computed before the "end" of a previous phase Is 
completed. This allows additional work to be generated somewhat earlier to 
keep computing resources busy during each computational rundown. This paper 
Identifies the conditions under which this can occur, reports the frequency of 
occurrence of such overlapping In an actual parallel Navler-Stokes code, sug- 
gests a language construct, and discusses possible control strategies 
for the management of such computational phase overlapping. 


INTRODUCTION 

General purpose parallel computations are usually divided Into phases 
that must execute sequentially In order to guarantee algorithmic Integrity. 
For Instance, the checkerboard approach to the successive over-relaxation 
solution of the potential field problem divides Into two such phases: the 

"odd" locations phase and the "even" locations phase. On the parallel phase 
level, the Iterated values of the previous phase must be complete before the 
new values of the next phase can be correctly computed. 

In the checkerboard algorithm, the execution time of each location Is 
definite (nominally, the time for four additions and a divide). Thus, the 
distribution of work among processors can be accurately planned. Under Ideal 
conditions (Involving the number of checkerboard locations In comparison to 
the number of processors), the distribution of work can be arranged so that 
each processor shares an exactly even portion of the work and, as a conse- 
quence, each processor completes Its work at exactly the same time. Perfect 
computation resource utilization Is realized (at least In a practical sense) 
since the next computational phase can begin Immediately. 

Unfortunately, ideal conditions are Infrequently found In real applica- 
tions. Continuing with the checkerboard algorithm, consider the situation 
when the potential grid Is 1024 points on a side (2**20 grid points) and 1000 
processors are available. Each computational phase will provide 524, 288 
Individual computations, or 524 computations for each of the 1000 processors; 



however, 288 computations will be left over for distribution among the 1000 
processors. This will leave 712 processors with nothing to do while the final 
288 computations are carried out. 

The burden of experience gained by the author suggests that even this 
example Is optimistic. Host computations carried out by the author's parallel 
Navler-Stokes solver (the Combined Aerodynamic and Structural Dynamic Problem 
Emulating Routines or CASPER (ref. 1) which was controlled by the Parallel, 
Asynchronous Executive or PAX (ref. 2)) could not even be ascribed with defi- 
nite execution times. In some Instances, whether or not the computation was 
even to be carried out In a particular Instance was a conditional part of the 
algorithm. No control over the computatlon-count-to-processor ratio was 
attempted — processors were allocated as they became available on a the-more- 
the-merrler basis. Also, shared Information access times were unpredictable 
and unrepeatable from Instance to Instance. As a result, there was no assur- 
ance that Individual processors could be kept busy as a particular computa- 
tional phase drew to a close. 

The PAX/CASPER project provided the experience base cited later In this 
paper. PAX/CASPER was focused on a parallel, general purpose, Navler-Stokes 
solver. Thus, this experience base Is presented not as a grand generalization 
for all of parallel processing, but as a specific 
example In practical parallel processing. 

Certain other situations that might seem of Interest In the overlapping 
of computational phases (for Instance, the possibilities for overlapping In a 
tight Iterative loop) are not treated for the simple reason that they did not 
occur In the PAX/CASPER project. PAX/CASPER was not so much a research project 
In parallel processing as an exploratory development of a far-term aerodynamic 
tool. Thus, the motivation was to solve the problems that occurred rather 
than to solve the problems that one could Imagine. 

It has been suggested that scheduling and overhead problems will be a 
particular problem In PAX/CASPER. So far, this has not been the case. Opera- 
tional experience shows that the ratio of computation to management has been 
running at something In the neighborhood of 200. This paper Is an effort to 
chart a method of Improving upon this situation so as to stave off any back- 
sliding that might occur as the ratio of computational to management resources 
Increases. There are additional strategies which have been Identified for 
development. These Include a middle management scheme to parallelize the 
serial management function, a direct worker-to-worker lateral communication 
scheme, and a data-proxlmlty work assignment algorithm. These strategies com- 
bined with the overlapping of computational phases should enhance the manage- 
ment overhead situation. 

Various solutions to the computational rundown problem may be acceptable. 
Some parallel processing schemes for general purpose computation may choose 
simply to accept the lower processor utilization as a minor design flaw. 

Another alternative Is to create a multl-paral lel-job-stream environment that 
allows computational work of one job stream to fill In when another job stream 
enters a computational rundown situation. This will bring processor utiliza- 
tion up; however. It should be recognized that the primary goal of parallel 
processing Is to reduce elapsed wall-clock time for a given job. The Intro- 
duction of such a "batch" environment will Inevitably distribute processor 
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resources among the several job streams and, thus, reduce the total processing 
power on any particular job and lengthen Its elapsed wall-clock time. 


Overlapping Computational Phases 

The goal then Is to find more ready-to-compute work from the parallel 
algorithm that Is being computed. As mentioned previously, this Is not pos- 
sible at the parallel phase level: each phase must be completed before the 
next Is begun In order to guarantee algorithmic Integrity; however, If an 
examination Is made at a deeper (sub-phase or, In the terminology of the 
author, task) level, It Is frequently discovered that the completion of por- 
tions (tasks) of one phase will allow the correct computation of portions 
(tasks) of the succeeding phase. 

Consider again the checkerboard algorithm. If all the "odd" locations 
adjacent to a particular "even" location have been updated with new values 
from the current computational phase, then the new value for that particular 
"even" location for the next computational phase can be correctly computed. 
Additionally, since all the computations requiring as an Input the current 
value of that particular "even" location have been completed, the value for 
that "even" location can be updated without affecting the results of the 
current computational phase. 

At this point, It Is necessary to make certain assumptions (or, alterna- 
tively, set certain system design constraints) about the nature of computa- 
tional phase rundown. Two basic situations arise: one In which task assignments 
and releases are statically determined and one In which such matters are 
dynamically determined. 

The static situation Is much simpler from the standpoint of next-phase 
task release timing since everything Is determined ahead of time. In this 
case. It can be acceptable for computational rundown to begin almost Immedi- 
ately since the scheduling of the next-phase task has already been statically 
determined. No completion processing of current-phase tasks Is required to 
schedule the release of the next-phase task. (In fact, work In this area for 
the purposes of real-time simulation has been conducted for some years at NASA 
Lewis (refs. 3 and 4)) . 

The dynamic scheduling situation Is substantially more Interesting. Some 
time delay must be available between the completion of the first current-phase 
tasks and the onset of computational rundown. This delay Is needed to provide 
time to process the completion of the early current-phase tasks and, In so 
doing, schedule the next-phase tasks that are thus enabled. During this delay, 
there must be enough current-phase tasks to keep the processing resources busy 
In order to avoid a computational load dip while the next-phase tasks are 
scheduled . 

In the dynamic scheduling situation, enablement relationships between the 
current-phase tasks and the next-phase tasks (l.e., the relationship that 
enables a next-phase task based upon the completion of a current-phase task) 
may be either static or dynamic. That Is, the completion of a particular 
current-phase task may always enable the same next-phase task (the static 
enablement case) or It may enable some next-phase task that can only be Identi- 
fied at the time of execution (the dynamic enablement case). The nature of 
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the enablement relationships Is Important because It Is Involved In setting 
the time delay from the completion of the first current-phase tasks to the 
availability of the first enabled next-phase tasks. 

Considering these characteristics of the dynamic scheduling situation 
( 1 . e . , the time to process current-phase task completion, the time to recognize 
enablement relationships, and the time to schedule enabled next-phase tasks), 

It can be observed that the number of tasks should substantially outnumber the 
number of processors. Certainly, there should be at the outset of the current- 
phase work at least two tasks for each processor so that at least one task 
execution time will be available to process the completion of the first task 
assigned to the processor and to schedule the enabled next-phase task. This 
presumes that completion processing and task scheduling time Is small with 
respect to task execution time. In particular, It assumes that one such com- 
pletion, enablement, and scheduling cycle for each of the processors In the 
system can be completed In a single task execution time. (The author's experi- 
ence with PAX suggests that this Is reasonable even for dynamic managerial 
style parallel processing systems. Systems that use hardware-level synchro- 
nization primitives presumably would be at even greater advantage In this 
area. ) 

The conditions under which this overlapping of computational phases can 
correctly occur are the same as those that allow parallel computations within 
a particular phase. Let the logical predicate PARALLEL(x.y) return the condi- 
tion TRUE when x and y are such that parallel computations are allowed. 

Clearly, PARALLEL(n,m) must always be TRUE If n and m are distinct computa- 
tional granules of the same parallel computational phase. Let q be an uncom- 
pleted granule of the current phase and r be a granule of the next phase that 
has been enabled by some completed granule, p, of the current phase. If 
PARALLEL(q.r) necessarily returns the value TRUE, then the current-phase and 
next-phase can be correctly overlapped. 

The exact nature of the logical predicate PARALLEL(x,y) Is, of course, of 
substantial practical Interest; however. It has no direct Impact upon the 
ability to overlap phases as outlined above. Different parallel systems may 
Identify different logical predicates. 


Identifying Enabled Granules 

The first challenge to be met Is to find a way of Identifying enabled 
next-phase granules for overlapping. It Is easy to postulate that some map- 
ping function exists either to map from the set of completed granules, p, to 
the set of enabled granules, r, or to map from the set of uncompleted granules, 
q, to the set of enabled granules, r. It Is very difficult to establish what 
this mapping function might be In any general way. Fortunately, this mapping 
function Is much more easily Identified when each concrete situation Is faced. 

First, consider the simplest Imaginable case as represented by the follow- 
ing Fortran code segment: 
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First computational phase 


DO 100 1=1, N 
B( I)=A( I) 

100 CONTINUE 

DO 200 1=1, N 
D( I) =C( I) 

200 CONTINUE 


Second computational phase 


Assuming that there are not shared output area constraints, It can be observed 
that these two parallel computational phases can be computed In parallel with 
each other. This represents what might be called a universal mapping function 
wherein any granule of the second computational phase Is enabled by any granule 
or set of granules (Including the null set) of the first computational phase. 

PAX/CASPER experience shows that 6 out of 22 (or 27 percent) of the paral- 
lel computational phases allow universal mapping enablement of the succeeding 
phases. This represents 266 out of 1188 lines (or 22 percent) of the code 
that Is executed In parallel In PAX/CASPER. 

This universal mapping usually occurs In PAX/CASPER when the nature of 
the larger computational process Is changing. For Instance, the change over 
from power of compression computations to Interpolator matrix generation Is 
one such character change. The two computations do not Involve shared Inform- 
ation of any kind and, thus, they can be entirely overlapped. Of course, the 
two phases could be merged Into one by a preprocessor of the parallel control 
stream; however, since the mechanisms necessary to handle this case would be a 
subset of those needed for the following case, It might well be simpler to 
support this enablement mapping. 

For the next case, consider the following Fortran fragment that Is to be 
computed In parallel as two succeeding computational phases: 


DO 100 1=1, N ! First computational phase 
B(I)=A(I) ! 

100 CONTINUE ! 

DO 200 1=1, N ! Second computational phase 
C( I) =B( I) ! 

200 CONTINUE ! 


Again assuming that there are not shared output area constraints, It can be 
observed by Inspection that the Identity mapping function (1=1) maps from 
completed granules, p, to enabled granules, r. This Is also a simple and 
easily Identified mapping. 

PAX/CASPER experience Indicates that It applies In 9 out of 22 (or 41 
percent) of the parallel computational phases (representing 551 of 1188 code 
lines, or about 46 percent of the parallel code In PAX/CASPER). Combining 
this direct mapping with the simpler universal mapping above Indicates that 
(at least In PAX/CASPER experience) 68 percent of the parallel computational 
phases and 68 percent of the code executed In parallel can be easily overlapped 
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to defeat computational rundown. These two enablement mapping possibilities 
are the most frequently occurring situations In PAX/CASPER experience. 

The next most frequently occurring enablement mapping In PAX/CASPER 
experience Is what could be called null mapping, that Is, the situation In 
which no overlapping Is possible. This occurs In 4 out of 22 (or 18 percent) 
of the computational phases and represents 262 out of 1188 (or 22 percent) of 
the lines of code executing In parallel. In all cases the cause was not that 
such an overlapping did not exist between the parallel computations but was, 

In fact, that serial actions and decisions had to occur between the phases. 

This Is Important since It allows one to assess how often the extra effort of 
supporting overlapping features will be entirely defeated, regardless of the 
sophistication of the overlapping phase support features. 

Another enablement mapping occurring In PAX/CASPER experience Is a reverse 
Indirect mapping. Consider the following Fortran fragment: 


DO 10 1=1, N 
DO 10 J=l,10 
IMAP(J,I)=IRAND() 

10 CONTINUE 

DO 100 1=1, N 
A(I)=FUNC(I) 

100 CONTINUE 

DO 200 1=1, N 
DO 200 J=1 ,1 0 
B( I)=A( IMAP)(J, I) ) ! 


200 CONTINUE 


Set up source mapping 

IRAND produces an Integer 
In the range 1 to N 
First computational 
phase generates some 
number In A(x) 

Second computational 
phase sums subsets of 
the results of the 
first computational 
phase 


Clearly, this computation can be overlapped; however, determining the 
enablement mapping Is very difficult. This Is because knowing that a particu- 
lar first phase granule Is complete does not directly Identify any distinct 
second phase granule as computable; however, a reverse mapping from desired 
second phase granule to required first phase granules Is possible. 

In PAX/CASPER experience, this situation occurs In 2 of 22 (or 9 percent) 
of the computational phases representing 78 out of 1188 (or 7 percent) of the 
lines of code executing In parallel. While this Is not a frequently occurring 
situation In PAX/CASPER experience. It cannot be Ignored out of hand. Some 
engineering judgement must be made to weigh the cost (In terms of management 
overhead, computational resource transferred from workers to management, etc.) 
of some reverse enablement mapping solution against the cost of computational 
rundown In 9 percent of the parallel computational phases. 

Certainly, a solution exists for the reverse. Indirect enablement mapping. 
Once the values of the Information selection map (represented In the code frag- 
ment by the array IMAP) have been determined. It Is a simple matter to produce 
a composite map of first phase granules that must be completed In order to 
enable a particular second phase granule. The executive can then use this map 
upon each first phase granule completion to determine the computability of 
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particular second phase granules. This map could also be used to direct a 
preferred order of first phase granule dispatching so as to enable a known 
second phase granule as early as possible. 

Two Important facts about this reverse enablement mapping must be 
Included. First, both occurrences of this situation Involved a dynamically 
generated Information selection map. Thus, the composite granule map would 
have to be generated by the executive at or after first phase Initiation but 
before any second phase enablements. Second, the Impact of executive computa- 
tion must be considered. In the PAX/CASPER UNIVAC 1100 test bed, executive 
computation was done at the direct expense of worker computation. Thus, exten- 
sive composite granule map generation could be self defeating. Some real 
parallel machines may provide separate executive computing resources, In which 
case the generation and use of composite granule maps would not be out of the 
question. 


A final enablement form was observed In PAX/CASPER that could be charac- 
terized as a forward. Indirect mapped situation. Consider the following 
Fortran fragment: 


DO 10 1=1, M 
IMAP( I)=IRAND( ) 

10 CONTINUE 

DO 100 1=1, M 

B( IMAP( I ) ) =A( IMAP( I ) ) 
100 CONTINUE 

DO 200 1=1, N 
C( I)=B( I) 

200 CONTINUE 


Generate forward 
map 

Use forward map 
to operate on a 
subset of the 
arrays 

Perform some further 
further opera- 
tion on the 
complete arrays 


This situation Is somewhat easier than the reverse, Indirect mapping In 
that the Identification of a particular granule In the first phase can be 
directly mapped to an enabled granule In the successor phase; however, much of 
the complication of a mapped enablement remains. This form was the least fre- 
quently occurring situation In PAX/CASPER showing up only once (5 percent of 
the phases) and accounting for only 31 of 1188 lines of code executed In 
parallel . 

No other forms of enablement mapping were observed In PAX/CASPER. 
Certainly, extensions of the forms already presented can be Imagined. 
Additionally, a seam mapping problem (such as would be appropriate for the 
checkerboard approach to the successive over-relaxation problem) can be 
foreseen. These other forms are beyond the scope of the present paper. 


Language Construction 

The developing PAX/CASPER language Is simple and requires the user to 
make specific statements concerning choices for the management of each parallel 
computational phase. Statements Involving the enablement of a succeeding phase 
could be made at two times: during the definition of a computational phase to 
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the management system and during the Invocation of the phase for actual com- 
putations. The difficulty to be faced Is that the statements no longer apply 
solely to the phase being referenced, but rely also on the characteristics of 
the succeeding phase. 

The simplest approach Is to require the user to specify the appropriate 
enablement mapping method when the phase In Invoked. It might appear as In 
the following PAX parallel language fragment: 

DISPATCH phase-name 

ENABLE/MAPPING=opt1on 


This Is simple and explicit; however. It leaves the door wide open to 
user mistakes. There Is no Interlock between this phase and the next that can 
be verified by the executive. A simple solution to this would be to Identify 
the name of the enabled next phase so that the executive system (or language 
processor) can verify that. In fact, that phase Is following. This might 
appear as follows: 


DISPATCH phase-name 

ENABLE [phase-name/MAPPING=opt1on] 


This allows the desired verification, but also brings up a new possi- 
bility. Occasionally, a conditional branch that Is not dependent on the com- 
putational phase separates that phase from two or more succeeding phases, each 
of which may (or may not) be overlappable. If each of these phases were Iden- 
tified In the above construct, the executive could preprocess the branch and 
overlap the appropriate phase. This could look as follows: 


DISPATCH phase-name 

ENABLE/BRANCHINDEPENDENT 
[phase-name-1 /HAPPING=opt1on 
phase-name-2/MAPPING=opt1on] 

IF ( IM0D( L00PC0UNTER,10) .NE.O) 

THEN GO TO branch-target 

DISPATCH phase-name-1 

GO TO rejoin 

branch-target: 

DISPATCH phase-name-2 
rejoin: 
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Finally, the matching of mapping selections and phases and the Invocation 
of the appropriate overlapping services Is something that could be done when 
the parallel phase Is defined to the system; however, It would still be neces- 
sary to Identify preprocessable branches at the computation Invocation site. 
This could appear as follows: 

DEFINE PHASE phase-name 
ENABLE [ 

phase-name-1 /MAPPING=opt1 on 
phase-name-2/MAPPING=opt1on 
phase-name-3/MAPPING=opt1on 
] 

DISPATCH phase-name 

ENABLE/BRANCHDEPENDENT 


The ENABLE/BRANCHINDEPENDENT would be deleted when branch preprocessing 
was either not appropriate or not needed. The executive system could perform 
the appropriate lookahead to see whether any of the named succeeding phases 
was actually following and apply, as appropriate, the specified enablement 
mapping. 


Control Strategies 

Control strategies for enabling and scheduling overlapped parallel compu- 
tational phases are, of course, highly dependent upon the overall parallel 
processing strategies. As alluded to earlier, some approaches to parallel 
processing may do all of this before any computations are begun. Indeed, the 
entire process may be done manually by a human being when the pattern of paral- 
lel processing Is fixed for the life of the system. 

Within the PAX system, the opposite Is true: the Identification and 

scheduling of computable granules Is entirely automatic. A scheduling mecha- 
nism for enabled computational granules already exists within the PAX system. 

It was developed to schedule dynamically created computations that conflicted 
(usually In terms of shared data access) with pre-existing computational 
granules . 

Within PAX, each Internal description of one (or more) computational 
granules Included a queue head for a double circularly-linked list of comput- 
able but conflicting computational granules. Upon completion of the described 
computation, all the queued conflicting computations became unconditionally 
computable and were placed In the waiting computation queue. The waiting com- 
putation queue was kept In a known order and, for the purposes of the conflict- 
ing computation problem, It was determined that such conflicting computations 
would be placed ahead of the normal computations In the queue and, thus, given 
higher priority. 

The scheduling of universally mapped successor phases within this system 
Is very easy Indeed. At the time of phase Initiation, the successor phase Is 
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also Initiated and the resulting computation description placed In the waiting 
computation queue behind the current phase description. 

The scheduling of directly enabled successor phases Is similarly easy at 
first sight. At the time of phase Initiation, the successor phase Is also 
Initiated and the resulting computation description placed In the conflicted 
computation queue of the current phase description. Thus, when the current 
phase computation Is completed, the now-enabled successor computation will be 
placed In the waiting computation queue to be considered for scheduling. 

The above approach for directly enabled successor phases Is fine If each 
Indivisible granule of computation Is described separately. Unfortunately, 
this Is usually not economical (In terms of storage space and task search 
times, among other things) and was not the choice taken In PAX design. Compu- 
tations were. Instead, described as large, contiguous collections of granules. 
The descriptions were split apart as necessary to produce conveniently sized 
tasks for workers and then merged back Into single descriptions when the work 
was completed. This splitting of descriptions requires that queued computation 
descriptions also be split so that each queued description will accurately 
reflect the enablement relationship between the computation and Its queued 
successor computation. 

While this Is certainly possible, It forces a further design decision for 
the executive software. PAX computation splitting was demanddrlven by the 
presence of an Idle worker. It was felt that the delay while splitting a task 
description was acceptable; however, the additional delays of splitting queued 
successor computation descriptions may represent an unacceptable situation. 

Two possible solutions exist. One possibility Is to prespllt the tasks before 
Idle workers present themselves to the executive. This would allow the execu- 
tive to work ahead In otherwise Idle time. Alternatively, the splitting of a 
computation could generate a successor-splitting task that could be quickly 
queued for later attention when the executive would again be Idle. 

The successor computation description could be removed from the current 
computation description and Included In the successor-splitting task Informa- 
tion. When the successor-splitting task Is executed the successor computation 
could be split and requeued to the appropriate current computation descriptions. 

Management of Indirectly (both forward and reverse) mapped successor com- 
putations Is a good deal more Interesting. The description of the successor 
computation cannot simply be queued to the description of the current computa- 
tion since there Is no guarantee of the enablement relationship. Additionally, 
It would seem wise to get the current phase Into execution without the delay 
of constructing the necessary Information for enabling successor computations. 
Both forward and reverse Indirection would seem well handled by much the same 
mechanisms since the only significant difference Is the direction of the 
Indirection. Each leads naturally to a list of current phase granules that 
must be completed to enable a particular successor phase granule. 

It would seem appropriate to Identify a subset group of successor-phase 
granules that are to be the subject of the enablement operation so as to avoid 
solving an unnecessarily large enablement problem. Once this subset has been 
Identified, the current-phase granules that enable the successor subset can be 
Identified. Since these are not necessarily the current phase granules that 
would be naturally selected by the scheduling mechanism, they should be split 
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Into Individual descriptions and placed in the waiting computation queue in 
such a manner as to elevate their computational priority. 

It is Important to note that the description of the successor subset can- 
not simply be queued to any one of the identified current-phase granules since 
it is enabled not by the completion of any one such granule but by the comple- 
tion of all the identified granules. This enablement on completion of all 
identified current-phase granules can be handled by any number of simple mech- 
anisms. For Instance, during completion processing, a status bit (set when 
the current-phase granules were identified and split into individual descrip- 
tions) can be checked and, if it is set, an enablement counter decremented. 
When the enablement counter reaches zero, it can be taken as a signal that the 
successor-phase granules are computable. 


CONCLUDING REMARKS 

This paper has discussed the possibilities for overlapping parallel com- 
putations in a general purpose parallel-computation environment so as to mini- 
mize loss of computational resources. Practical experience with PAX/CASPER, a 
parallel Navier-Stokes solver, suggests that simple and plausible steps could 
provide such overlapping in 68 percent of the computational phases and that, 
with extended effort, more than 90 percent of the computational phases are 
amenable to some form of phase overlapping. 
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